Taggers Gonna Tag: An Argument against Evaluating Disambiguation Capacities of Morphosyntactic Taggers

نویسندگان

  • Adam Radziszewski
  • Szymon Acedanski
چکیده

Usually tagging of inflectional languages is performed in two stages: morphological analysis and morphosyntactic disambiguation. A number of papers have been published where the evaluation is limited to the second part, without asking the question of what a tagger is supposed to do. In this article we highlight this important question and discuss possible answers. We also argue that a fair evaluation requires assessment of the whole system, which is very rarely the case in the literature. Finally we show results of the full evaluation of three Polish morphosyntactic taggers. The discrepancy between our results and those published earlier is striking, showing that these issues do make a practical difference.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards the Adequate Evaluation of Morphosyntactic Taggers

There exists a well-established and almost unanimously adopted measure of tagger performance, namely, accuracy. Although it is perfectly adequate for small tagsets and typical approaches to disambiguation, we show that it is deficient when applied to rich morphological tagsets and propose various extensions designed to better correlate with the real usefulness of the tagger.

متن کامل

Improving Morphosyntactic Tagging of Slovene Language through Meta-tagging

Part-of-speech (PoS) or, better, morphosyntactic tagging is the process of assigning morphosyntactic categories to words in a text, an important pre-processing step for most human language technology applications. PoS-tagging of Slovene texts is a challenging task since the size of the tagset is over one thousand tags (as opposed to English, where the size is typically around sixty) and the sta...

متن کامل

Morphosyntactic Tagging of Slovene Using Progol

We consider the task of tagging Slovene words with morphosyntactic descriptions (MSDs). MSDs contain not only part-of-speech information but also attributes such as gender and case. In the case of Slovene there are 2,083 possible MSDs. P-Progol was used to learn morphosyntactic disambiguation rules from annotated data (consisting of 161,314 examples) produced by the MULTEXT-East project. P-Prog...

متن کامل

Combining Pos Taggers for Improved Accuracy to Create Telugu Annotated Texts for Information Retrieval

POS Tagging is the process of assigning a correct POS tag (can be a noun, verb, adjective, adverb, or other lexical category marker) to each word of the sentence. POS taggers are developed by modeling the morpho-syntactic structure of natural language text. We attempted to improve the accuracy of existing Telugu POS taggers by using an voting algorithm. The three Telugu Pos taggers viz., (1) Ru...

متن کامل

Morphosyntactic Tagging of Slovene: Evaluating Taggers and Tagsets

The paper evaluates tagging techniques on a corpus of Slovene, where we are faced with a large number of possible word-class tags and only a small (hand-tagged) dataset. We report on training and testing of four different taggers on the Slovene MULTEXT-East corpus containing about 100.000 words and 1000 different morphosyntactic tags. Results show, first of all, that training times of the Maxim...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012